Military and Aerospace Programmable Logic Devices (MAPLD) Conference Sept. 15-18, 2008, Annapolis, MD



# Reformation, Formulation, and the Future of Reconfigurable Computing

### Alan D. George, Ph.D.

#### **Director, NSF CHREC Center**

Professor of ECE, University of Florida



### Outline

- What is CHREC?
- Architecture Reformation
- Application Reformation
- Formulation
- Conclusions







# What is CHREC?











### What is CHREC?



- NSF Center for High-Performance Reconfigurable Computing
  - Unique US national research center in this field, established Jan'07
  - Leading research groups in RC/HPC/HPEC @ four major universities
    - University of Florida (lead)
    - Brigham Young University
    - George Washington University
    - Virginia Tech



- Industry/University Cooperative Research Center
  - CHREC is supported by CISE & Engineering Directorates @ NSF
- CHREC is both a National Center and a Research Consortium
  - University groups serve as research base (faculty, students, staff)
  - Industry & government organizations are research partners, sponsors, collaborators, advisory board, & technology-transfer recipients





**BLUE** = founding member since 2007 **ORANGE** = new member in 2008

>30 members with

>40 memberships

in 2008

### **CHREC** Members



#### National Science Foundation



- **AFRL Munitions Directorate** 1.
- **AFRL Space Vehicles Directorate**
- Altera 3
- 4
- **Arctic Region Supercomputing Center**
- **Boeina** 6.
- Cadence 7.
- **GE Aviation Systems** 8.
- Gedae 9.
- Harris Corp. 10.
- Hewlett-Packard 11.
- Honevwell 12.
- **IBM Research** 13.
- Intel 14.
- -3 Communications 15.
- ockheed Martin MFC 16.
- Lockheed Martin SSC 17.
- Los Alamos National Laboratory 18.
- Luna Innovations 19.
- **NASA Goddard Space Flight Center** 20.
- **NASA Langley Research Center**
- **NASA Marshall Space Flight Center** 22.
- National Instruments 23.
- **National Reconnaissance Office** 24.
- **National Security Agency** 25.
- **Network Appliance** 26.
- **Office of Naval Research** 27.
- Raytheon 28.
- **Rincon Research Corp.** 29.
- **Rockwell Collins** 30.
- 31. Sandia National Laboratory NM







6







BYU

BRIGHAM YOUNG

Tech

VINGRUA POLYTECHING INSTITUTE AND STATE UNIVERSITY

### **CHREC** Faculty

#### University of Florida (lead)

- Dr. Alan D. George, Professor of ECE Center Director
- Dr. Herman Lam, Associate Professor of ECE
- Dr. K. Clint Slatton, Assistant Professor of ECE and CCE
- Dr. Ann Gordon-Ross, Assistant Professor of ECE
- Dr. Greg Stitt, Assistant Professor of ECE
- Dr. Saumil Merchant, Post-doc Research Scientist
- Brigham Young University
  - Dr. Brent E. Nelson, Professor of ECE BYU Site Director
  - Dr. Michael J. Wirthlin, Associate Professor of ECE
  - Dr. Brad L. Hutchings, Professor of ECE
  - Dr. Michael Rice, Professor of ECE
- George Washington University
  - Dr. Tarek El-Ghazawi, Professor of ECE GWU Site Director
  - Dr. H. Howie Hwang, Assistant Professor of ECE
  - Dr. Vickram Narayana, Dr. Proshanta Saha, and Dr. Harald Simmler, Post-doc Research Scientists
- Virginia Tech
  - Dr. Peter Athanas, Professor of ECE VT Site Director
  - Dr. Wu-Chun Feng, Associate Professor of CS and ECE
  - Dr. Francis K.H. Quek, Professor of CS





CHREC features a strong team of >40 graduate students spanning our four university sites.





### **2008 CHREC Projects**



#### Fault Tolerance (3)

- Reconfigurable Fault Tolerance and Partial RTR (F4)
- High-Reliability Design Tools & Techniques (B3)
- Reliable RC DSP/Comm Systems (B4)

#### **Device Studies** (4)

econfigurable Computing

- Device Characterization (F5)
- Heterogeneous Architectures for HPEC RC (B2)
- Process-to-Core Mapping for Adv. Architectures (V2)
- Partial RTR for HPRC (G7)

#### Productivity Concepts (4)

- System-Level Formulation (F1)
- Model-Based Engineering Framework (V1)
- Runtime Performance Analysis (F2)
- Intelligent Deployment of IP Cores (G6)

#### **Productivity Studies** (3)

- Case Studies in Multi-FPGA App Design (F3)
- Library Portability for HLL Acceleration Cores (G5)
- Core Library Framework (B1)

(where F=Florida, B=BYU, G=GWU, V=VaTech)



# Architecture Reformation





### **Multicore and Manycore**

### "RC was multicore when multicore wasn't cool."



"I was country when country wasn't cool." – Barbara Mandrell





### **Architecture Reformation**

- End of wave (Moore's Law) riding f<sub>clk</sub> + ILP (CPU)
  - Explicit parallelism & multicore the new wave
- Many promising technologies on new wave
  Fixed & reconfigurable multicore device architectures
- Many R&D challenges lie on new wave
  - Tried & true methods no longer sufficient; complexity abounds
  - Semantic gap widening between applications & systems
    - e.g. App developers must now understand & exploit parallelism
- Inherent traits of fixed device architectures (FMC)
  - App-specific: inflexible, expensive (e.g. ASIC)
  - App-generic: power, cooling, & speed challenges (CPU)
  - Many niches between extremes (Cell, DSP, GPU, NP, etc.)
- Reconfigurable architectures promise best of both worlds
  - Speed, flexibility, low-power, adaptability, economy of scale, size
  - Bridging embedded & general-purpose computing, superset of fixed



FMC = fixed multicore/manycore devices RMC = reconfigurable multicore/manycore devices







## What is a Reconfigurable Computer?

- System capable of changing hardware structure to address application demands
  - Static or dynamic reconfiguration
  - Reconfigurable computing, configurable computing, custom computing, adaptive computing, etc.
  - Often a mix of conventional fixed & reconfigurable devices (e.g. control-flow CPUs, data-flow FPLDs)
- Enabling technology?
  - Field-programmable multicore devices
  - FPGA is "King" (but space is broadening)
- **Applications?**

leconfigurable Computing

- Vast range computing and embedded worlds
- Faster, smaller, less power & heat, adaptable & versatile, selectable precision, high comp. density













### **Opportunities for RC?**



Reconfigurable Computing







### When and Where to Apply RC?

- When do we need?
  - When performance & versatility are critical
    - Hardware gates targeted to application-specific requirements
    - System mission or applications change over time
  - When the environment is restrictive
    - Limited power, weight, area, volume, etc.
    - Limited communications bandwidth for work offload
  - When autonomy and adaptivity are paramount
- Where do we need?
  - In conventional servers, clusters, and supercomputers (HPC)
    - Field-programmable hardware fits many demands
    - High DoP, finer grain, direct data-flow mapping, bit manipulation, selectable precision, direct control over H/W (e.g. perf. vs. power)
  - In space, air, sea, undersea, and ground systems (HPEC)
    - Embedded & deployable systems can reap many advantages w/ RC









### **Reconfigurability Factors**

NSF Center for High-Performance

Reconfigurable Computing



BYL

BRIGHAM YOUNG

UNIVERSITY

### **Future Convergence**

- Rising development costs & other factors drive convergence
  - As seen in many other technologies
- Device architecture convergence?
  - Manycore is driven by densities
  - Heterogeneous?
    - Cell as initial example
    - Intel and AMD both cite heterogeneous MC in their future
    - To extent complexity is manageable
  - Reconfigurable

NSF Center for High-Performance Reconfigurable Computing

- Performance + energy + versatility
- Adaptive for many apps, missions
- Ideal for long life-cycle systems
- Avoids limitations of fixed architectures
- Must manage issues of heterogeneity







### **RC: Vital Technologies for Future**

- Mission versatility (adapt as needs change)
  - Fixed devices are burdened with fixed choices, limited tradeoffs, cannot adapt over long lifecycle
- Mission performance (speed, power, etc.)
  - One of several metrics under study @ CHREC (F5-08) is Computational Density per Watt (CDW)
  - e.g. on CDW, FPGA devices found consistently superior to FMC devices (CPU, Cell, GPU, etc.)
  - See RSSI'08 paper (and upcoming HPEC'08 talk) for details w/ HPC & HPEC devices, respectively

RSSI'08: FPGA consistently best in class (CDW) ✓ Bit-level Gops/W (~28× vs. best FMC) ✓ 16-bit integer Gops/W (~17×) ✓ 32-bit integer Gops/W (~8×) ✓ 32-bit float Gops/W (~4×) ✓ 64-bit float Gops/W (~2×)



#### **RSSI'08: Devices Studied**

| 130 nm<br>FMC | ClearSpeed CSX600           |
|---------------|-----------------------------|
|               | Freescale PowerPC MPC7447   |
| 90 nm<br>RMC  | Altera Stratix-II EP2S180   |
|               | ElementCXI ECA-64           |
|               | Mathstar Arrix FPOA         |
|               | Raytheon MONARCH            |
|               | Tilera TILE64               |
|               | Xilinx Virtex-4 LX200       |
|               | Xilinx Virtex-4 SX55        |
| 90 nm<br>FMC  | IBM Cell BE                 |
|               | Intel Xeon 7041             |
|               | Nvidia Tesla C870           |
| 65 nm<br>RMC  | Altera Stratix-III EP3SL340 |
|               | Altera Stratix-III EP3SE260 |
|               | Xilinx Virtex-5 LX330T      |
|               | Xilinx Virtex-5 SX95T       |
| 65 nm<br>FMC  | Intel Xeon X3230            |



J. Williams, A. George, J. Richardson, K. Gosrani, and S. Suresh, "Computational Density of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration," *Proc. of Reconfigurable Systems Summer Institute 2008* (RSSI), Urbana, IL, July 7-10, 2008.



# Application Reformation





### "If You Build It, They Will Come"















Source: http://i.cdn.turner.com/sivault/image/2001/06/17/001234484.jpg





## **Application Reformation**

- Dawn of reformation in application development methods
  - Driven by architecture reformation; complexity management
  - Holistic concepts, methods, & tools must emerge
- Semantic gap widening between apps & archs
  - MC world (fixed or RC), explicit parallelism
    - Architectures increasingly complex to target by apps
    - New to fixed MC world, familiar to RC/FPGA & HPC worlds
  - Optimizing compiler ≠ parallelizing compiler
    - **Domain scientist** involved in comp. structure of their app
- How do we bridge semantic gap?
  - Focus upon computational fundamentals
    - Formal models, complexity management via abstraction, encapsulation
  - Learn lessons from other engineering fields
    - e.g. aerospace engineers do not flight-test first, why must we?
  - Build basis for an RC engineering discipline
    - Leverage where practical for fixed MC world











## DARPA Studies @ CHREC

DARPA

- Research roadmaps for app development on FPGA systems
  - Bridging app/arch semantic gap
    - Prevalent challenge of multicore
  - RC to revolutionize DoD missions
- DARPA studies by CHREC
  - Two independent studies
  - Roadmap results integrated
- Focus areas

econfigurable Computing

- Study underlying tools limitations
  - Theory, practice, technologies
- Formulate strategic research paths
  - Revolutionary, impactful
- Craft proposed research roadmaps
  - Highlight DARPA-hard challenges

#### Titles of Two Studies for DARPA

- Exploration of a Research Roadmap for Application Development & Execution on FPGA-based Systems
- Future FPGA Design Methodologies and Tool Flows

#### Update: Workshop held on 6/05/08

- Sponsored by DARPA
- Capstone event for both studies
- >50 experts in attendance
  > Morning presentations
  - Afternoon breakout groups
- Outcome: program research roadmap
  > Integration of both studies







23

### **Key Questions for DARPA Program**

- Why is FPGA-based reconfigurable computing (RC) of increasingly critical importance to DoD?
  - Performance, power, versatility, weight, size, cost
    - 2007 formation of CHREC is proof: >30 industry, government, & university research partners, many DoD-related
- What is #1 challenge of RC for DoD?
  - Programmability: limiting factor, semantic gap
    - From deployed systems for warfighters to DoD supercomputing
- Is this challenge unique to RC for DoD?
  - <u>Absolutely not</u>: in general, all multicore architectures (FMC and RMC) are facing similar fundamental issues
    - How to productively express & exploit hardware parallelism in a manner suitable for app developers including domain scientists?



FMC = fixed multicore/manycore devices RMC = reconfigurable multicore/manycore devices



## Formulation Starting point for reformation in application development





### **Principal Challenge is Complexity**

#### Seat-of-pants formulation

"Sail west until landfall made, all the while hoping that you don't fall off the earth."

econfigurable Computing

#### **Strategic formulation**

"Strategically explore various approaches, predict outcomes, study tradeoffs, choose best."





### **FDTE Model**

#### I. Formulation

- Strategic exploration
  - Not coding in traditional sense
- Parallel algorithm exploration
  - Control structures (wide, deep)
  - Data structures (elements, precision, layout)
- Parallel architecture exploration
  - As mapping targets of parallel algorithm
  - □ Base characteristics (e.g. DoP, OPS, B/W)

#### High-level performance prediction

- Supports tradeoff analysis (alg, arch, both)
- □ Memory hierarchy, data locality, bottlenecks
- Analytical, simulative, or combo
- Feeder to Design phase
  - Patterns, templates, code generation, libraries
- <u>Theme</u>: strategic design decisions



"We need a change in mindset, not simply another programming language." Formulation Design Translation Execution





### FDTE Model (continued)

#### II. Design

- Linguistic design semantics & syntax
- Graphical design semantics & syntax
- Hardware/software coding, co-design

### III. Translation

- Compilation
- Libraries & linkage
- Technology mapping (synthesis, PAR)

### IV. Execution

Reconfigurable Computing

- Test, debug, & verification
- Performance analysis & optimization
- Run-time services



**Execution Services** 



DTE phases traditionally used for "seat of pants"

formulation, but increasingly *inefficient* and *inappropriate*.

### **ART Model**



- Reduce detail required to specify computations by raising design abstractions
  - Leverage emerging concurrent models of computation
  - Remove circuit-level details
  - Support multi-FPGA synthesis

#### Reuse

Substantially increase amount of design reuse at



- all levels of design flowLibrary reuse standards
- Dual-layer compilation
- Interface synthesis

#### Turns per day

 Increase ease of design debug & deployment via many more "turns per day"



- Platform services
- Firmware

NSF Center for High-Performance Reconfigurable Computing

High-level abstraction debug



Design

1/0

Memory



#### **Integrated Research Roadmap for Proposed DARPA Program**



Projected productivity impact on order of 20×





## Conclusions





### **Future of RC**

- Determined by outcome of two Reformations
  - Architecture Reformation news is very encouraging
    - RMC inherently superior to FMC devices (CPU, GPU, Cell, etc.) when performance, energy, & versatility are all paramount (e.g. HPC & HPEC)
  - <u>Application Reformation</u> outcome is TBD
    - Historically, more complex to target apps to RMC devices
    - For RC to become a full-fledged paradigm of computing, must overcome major challenges in app dev productivity
  - Each reform relates to half of productivity ratio
    - Utility of technology vs. Cost of development
- $\Psi = \frac{U}{C}$
- Excel with U
- Compete on C

- Predicting the future
  - R&D success with application reformation for RC?
    - YES: Development costs drop, user domain vastly expands, big impact! ☺
    - **NO**: Remains as niche technology, perhaps also appliance technology





### Conclusions

- RC making inroads in ever-broadening areas
  HPC and HPEC; from satellites to supercomputers!
- As with any new field, early adopters are brave at heart
  - □ Face challenges with design methods, tools, apps, systems, etc.
  - Fragmented technologies with gaps and proprietary limitations
- Research & technology challenges abound
  - Productivity, device/system arch., FT, RTR, PR, etc.
  - CHREC sites & partners leading key R&D projects
- Industry/university collaboration is critical to meet challenges
  - Incremental, evolutionary advances will not lead to ultimate success
  - Researchers must take more risks, explore & solve tough problems
  - Industry & government as partners, catalysts, tech-transfer recipients







## Acknowledgements

We express our gratitude for support of CHREC by

- National Science Foundation
  - Program managers & assistants, center evaluator, panel reviewers
- CHREC Industry and Government Partners
  - >30 members holding >40 memberships in 2008
- University administrations @ CHREC sites
  - University of Florida
  - Brigham Young University
  - George Washington University
  - Virginia Tech
- Equipment and tools vendors providing support
  - Aldec, Altera, Celoxica, Cray, DRC, Gedae, GiDEL, Impulse, Intel, Mellanox, Nallatech, SGI, SRC, Synplicity, Voltaire, XDI, Xilinx









### Thanks for Listening! ③

### For more info:

- www.chrec.org
- george@chrec.org

### Questions?







<u>Home</u> Overview Calendar Faculty Students Projects Materials Facilities Vendors Members-Only

#### Home



Under the auspices of the highly acclaimed program for <u>Industry/University.</u> <u>Cooperative Research Centers (I/UCRC)</u> at the <u>National Science Foundation</u>, the longest-running program at NSF, a national center and consortium has been founded known as the NSF Center for High-Performance Reconfigurable Computing (CHREC, pronounced "shreck"). CHREC is comprised of more than 30 organizations from the academic, industry, and government sectors with synergistic interests and goals in reconfigurable,

adaptive computing for a broad range of missions, from satellites to supercomputers. After a two-year development and selection process at NSF, CHREC became operational in January 2007. The Center is comprised of four research sites, each a major university with a leading research group in this field, coupled with NSF and more than 30 industry and government partners that influence, collaborate, and benefit in the research with technology transfer. As is the nature of an I/UCRC, each industry or government partner supports CHREC with one or more Center memberships, where each membership is commensurate with a slot to fund one graduate student at one of the four sites. In 2008, members sponsor more than 40 memberships in CHREC.

A broad range of goals have been defined with NSF for CHREC, including: (1) Serve as the nation's first and foremost multidisciplinary research center in reconfigurable high-performance computing as a basis for long-term partnership and collaboration amongst industry, academe, and government; (2) Directly support the research needs of industry and government partners in a cost-effective manner with pooled, leveraged resources and maximized synergy; (3) Enhance the educational experience for a diverse set of high-quality graduate and undergraduate students; and (4) Advance the knowledge and technologies in this emerging field and ensure relevance of the research with rapid and effective technology transfer. National Science Foundation

#### CHREC Sites

- University of Florida (lead)
- George Washington University
- Brigham Young University
- Virginia Tech



#### CHREC Partners

- AFRL Munitions Directorate
- AFRL Space Vehicles Directorate
- <u>Altera</u>
- Arctic Supercomputing Center
- <u>AMD</u>
- <u>Boeing</u>
- <u>Cadence</u>
- GE Aviation Systems
- <u>Gedae</u>
- Harris
- Hewlett-Packard
- Honeywell
- IBM Research
- Intel
- L-3 Communications
  Lockheed Martin MFC
- Lockheed Martin SSC
- Los Alamos National Lab



😜 Internet | Prote

